East Kootenay Region
MassSpecGym: A benchmark for the discovery and identification of molecules Roman Bushuiev
Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym - the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data.
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs
Han, Pengrui, Kocielnik, Rafal, Song, Peiyang, Debnath, Ramit, Mobbs, Dean, Anandkumar, Anima, Alvarez, R. Michael
Personality traits have long been studied as predictors of human behavior.Recent advances in Large Language Models (LLMs) suggest similar patterns may emerge in artificial systems, with advanced LLMs displaying consistent behavioral tendencies resembling human traits like agreeableness and self-regulation. Understanding these patterns is crucial, yet prior work primarily relied on simplified self-reports and heuristic prompting, with little behavioral validation. In this study, we systematically characterize LLM personality across three dimensions: (1) the dynamic emergence and evolution of trait profiles throughout training stages; (2) the predictive validity of self-reported traits in behavioral tasks; and (3) the impact of targeted interventions, such as persona injection, on both self-reports and behavior. Our findings reveal that instructional alignment (e.g., RLHF, instruction tuning) significantly stabilizes trait expression and strengthens trait correlations in ways that mirror human data. However, these self-reported traits do not reliably predict behavior, and observed associations often diverge from human patterns. While persona injection successfully steers self-reports in the intended direction, it exerts little or inconsistent effect on actual behavior. By distinguishing surface-level trait expression from behavioral consistency, our findings challenge assumptions about LLM personality and underscore the need for deeper evaluation in alignment and interpretability.
Advancing Crime Linkage Analysis with Machine Learning: A Comprehensive Review and Framework for Data-Driven Approaches
Lima, Vinicius, Karabiyik, Umit
Crime linkage is the process of analyzing criminal behavior data to determine whether a pair or group of crime cases are connected or belong to a series of offenses. This domain has been extensively studied by researchers in sociology, psychology, and statistics. More recently, it has drawn interest from computer scientists, especially with advances in artificial intelligence. Despite this, the literature indicates that work in this latter discipline is still in its early stages. This study aims to understand the challenges faced by machine learning approaches in crime linkage and to support foundational knowledge for future data-driven methods. To achieve this goal, we conducted a comprehensive survey of the main literature on the topic and developed a general framework for crime linkage processes, thoroughly describing each step. Our goal was to unify insights from diverse fields into a shared terminology to enhance the research landscape for those intrigued by this subject.
MassSpecGym: A benchmark for the discovery and identification of molecules
Bushuiev, Roman, Bushuiev, Anton, de Jonge, Niek F., Young, Adamo, Kretschmer, Fleming, Samusevich, Raman, Heirman, Janne, Wang, Fei, Zhang, Luke, Dührkop, Kai, Ludwig, Marcus, Haupt, Nils A., Kalia, Apurva, Brungs, Corinna, Schmid, Robin, Greiner, Russell, Wang, Bo, Wishart, David S., Liu, Li-Ping, Rousu, Juho, Bittremieux, Wout, Rost, Hannes, Mak, Tytus D., Hassoun, Soha, Huber, Florian, van der Hooft, Justin J. J., Stravs, Michael A., Böcker, Sebastian, Sivic, Josef, Pluskal, Tomáš
The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: \textit{de novo} molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at \url{https://github.com/pluskal-lab/MassSpecGym}.
A Review of Artificial Intelligence based Biological-Tree Construction: Priorities, Methods, Applications and Trends
Zang, Zelin, Xu, Yongjie, Duan, Chenrui, Wu, Jinlin, Li, Stan Z., Lei, Zhen
Biological tree analysis serves as a pivotal tool in uncovering the evolutionary and differentiation relationships among organisms, genes, and cells. Its applications span diverse fields including phylogenetics, developmental biology, ecology, and medicine. Traditional tree inference methods, while foundational in early studies, face increasing limitations in processing the large-scale, complex datasets generated by modern high-throughput technologies. Recent advances in deep learning offer promising solutions, providing enhanced data processing and pattern recognition capabilities. However, challenges remain, particularly in accurately representing the inherently discrete and non-Euclidean nature of biological trees. In this review, we first outline the key biological priors fundamental to phylogenetic and differentiation tree analyses, facilitating a deeper interdisciplinary understanding between deep learning researchers and biologists. We then systematically examine the commonly used data formats and databases, serving as a comprehensive resource for model testing and development. We provide a critical analysis of traditional tree generation methods, exploring their underlying biological assumptions, technical characteristics, and limitations. Current developments in deep learning-based tree generation are reviewed, highlighting both recent advancements and existing challenges. Furthermore, we discuss the diverse applications of biological trees across various biological domains. Finally, we propose potential future directions and trends in leveraging deep learning for biological tree research, aiming to guide further exploration and innovation in this field.
Indirect Dynamic Negotiation in the Nash Demand Game
Guy, Tatiana V., Homolová, Jitka, Gaj, Aleksej
OLITICS and business are considered traditional spheres of human negotiation. The internet and modern goods/service characterised by several, possibly interrelated, means of communication have extended human negotiation attributes (say price of a product and terms of its delivery); ii) to new domains such as social networks, deliberative democracy, limited negotiation time as no agent can deliberate infinitely; e-commerce, cloud-based applications, [1], [2]. Besides, iii) absence of moderator to coordinate the negotiation, so the automatic bargaining and negotiation, being inevitable agents must reach agreement themselves [11]. in modern cyber-physical-social systems [3], have been established The negotiation has been widely addressed in diverse fields in variety of applications, like network negotiation, ranging from economy and sociology to computer science.
3D Data Long-Term Preservation in Cultural Heritage
Amico, Nicola, Felicetti, Achille
In digital heritage, effective management and preservation of digital data are crucial. Issues such as file corruption, media obsolescence, and inadequate metadata must be addressed, alongside data migration when software becomes outdated and thorough data curation to aid current and future researchers in searching, citing, and reusing historical data. Merely archiving or backing up project data is not enough for long-term preservation. It is essential to ensure that primary data remain reusable, compatible with evolving operating systems, and accompanied by comprehensive metadata detailing their creation and history [1]. Despite the advantage of heritage datasets being "born digital," they are still susceptible to loss if file associations and metadata are not properly maintained. The large volume of data generated from digital projects and the often limited understanding of file associations among project members jeopardise the future reuse of archaeological data if not well-organised or curated. Enhancing workflows to include both metadata authorship and preservation is vital to prevent information loss and digital data obsolescence. Particularly, the long-term preservation of 3D datasets requires maintaining each file in a usable and uncorrupted state. Files undergo several modifications, changing formats during the creation of the final scan or 3D model, known as an asset.